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Abstract 

In this paper, we identify some of the hmitations of current-day shape matching tech- 
niques. We provide examples of how contour-based shape matching techniques cannot 
provide a good match for certain visually similar shapes. To overcome this limitation, 
we propose a perceptually motivated variant of the well-known shape context descriptor. 
We identify that the interior properties of the shape play an important role in object 
recognition and develop a descriptor that captures these interior properties. We show 
that our method can easily be augmented with any other shape matching algorithm. We 
also show from our experiments that the use of our descriptor can significantly improve 
the retrieval rates. 
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1. Introduction 

Accurately measuring the similarity between two objects is a fundamental problem 
in many computer vision applications, and is still a largely unsolved problem. Many 
applications such as shape-matching, shape-retrieval, and shape-based object detection, 
rely on a strong and robust similarity measure. However, coming up with such a similarity 
measure has proven to be a difficult task since the definition of similarity itself is rather 
subjective. Given two cars, one might say that their similarity should be measured based 
on their colour, while others might argue that the make, and model, are better metrics 
for measuring the similarity. Given two shapes, one might justify their similarity based 
on the number of parts in the shape, while another might feel that the symmetry of the 
objects is an important criterion. 

Most pattern recognition problems are required to overcome this apparent vagueness 
in the definition of similarity and come up with a quantitative similarity (or dissimilar- 
ity) measure between objects. Restricting ourselves to the identification of dissimilarity 
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Figure 1: [Best Viewed in Colour] Figure shows examples of objects that are visually 
similar to each other even though they have multiple indentations (and even breaks) 
within their contours. Algorithms that perform contour-based matching of shapes cannot 
be used while matching such objects. Visually similar objects are appropriately color- 
coded using their bounding boxes. 



between shapes, given two shapes, Si and iS'2, dissimilarity measures try to identify the 
cost of transforming the shape 5*1 into the shape 6*2. The more similar the shapes are to 
each other, the easier it is to transform one into another, and thus, lower is the cost of 
matching. 

The challenge that still remains is to come up with a good measure, which can give 
reasonable costs between two shapes. The metric should ideally be invariant to transla- 
tion, rotation, and scaling; the metric should also be able to account for non-rigid shapes 
i.e., it should be invariant to articulations, the metric should be robust enough to ignore 
noise in the shape boundary, it should be able to handle deformations, and, hopefully, it 
would permit partial matching of shapes. Most current-day shape matching techniques 
cannot handle such diversity. 

The task of matching two shapes is currently being thought of as a task of matching 
their respective contours. A 2-D shape, S, is modeled as a surface residing in K,^, 
which has a well-defined boundary, B. Most algorithms sample this boundary and define 
features at the sampled locations. A correspondence problem is then solved, and the 
total cost of matching the two sets of features is considered as the cost of matching the 
two boundaries, and therefore, the cost of matching the two shapes f Section 13. 31 gives a 
detailed description of the method). 

While most of the shape information can usually be extracted from just the object's 
contour, it is not true in cases where the objects have a strong base structure. In 
such cases, indentations in their boundaries have minimal effect on the human visual 
system. Figure [T] shows examples of objects that are visually similar to each other even 
though some of them have multiple indentations in their contours. People tend to neglect 
these minor (or even major) indentations while perceiving the object's shape. This is in 
accordance with Gestalt psychology, which maintains that the human eye sees objects 
in their entirety before perceiving their individual parts. The gestalt effect is the form- 
generating capability of our senses, particularly with respect to the visual recognition of 
figures, and whole forms, instead of just a collection of simple lines and curves. 
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Due to the Gestalt effect, approaches that perforin shape- matching based on part 
decomposition, or curve matching, will not perform well on objects such as those shown 
in Figure [TJ Somehow, there is a need for the development of matching techniques that 
can some how capture this Gestalt effect 

In this paper, we propose a novel way of extracting the shape properties that capture 
the object's shape in its entirety. We show how this can help in improving the retrieval 
rates by testing on the well-known MPEG7 shape database. We also show improvements 
in performance over other recently proposed perceptually motivated techniques. 

The rest of the paper is organized as follows. Section [2] discusses some of the previous 
work on shape-based object detection. We also identify some of the problems that the 
recent techniques face. In Section[31 we explain our method in detail and show how it can 
be used to tackle some of the problems mentioned in Section [2] In Section |4j we provide 
results from our experiments, which shows an improvement over some of the recently 
proposed techniques. Finally, in Section [SJ we conclude the paper, with directions for 
future work. 

2. Related Work 

Shape matching has been recognized as an important area of computer vision and 
has been actively pursued in the recent past. Some notable advances that have been 
made in this area over the past decade are discussed below. A typical approach to 
measure shape similarity is through non-rigid shape deformation [2, 0] . Such methods 
measure the difficulty in transforming one shape into another. Geometrically, one can 
think of a shape S' as a point on some low-dimensional manifold M, residing in some 
high-dimensional space. The energy required to transform a shape 5*1 into a shape S2 
can be thought of as the geodesic distance^ of the shortest path between the two points 
lying on the manifold. 

Most approaches equate the task of shape-matching to the matching of the respective 
object boundaries. The shape boundaries are discretized into a set of n landmark points, 
S — {pi,P2, ■■■Pn}, for easier representation and matching. Belongie et al. |4| showed 
that these points could be located at any place on the object boundary and that they 
need not be restricted to extrema points on the curve. They also proposed to describe 
the shape using shape contexts at each of these sampled points. The shape context at 
each sampled point is given by the relative distribution of the rest of the n — 1 points, 
which is represented as a 2-D histogram of distances and angles. 

The shape context (SC) can be made invariant to translation, rotation and scale. How- 
ever, while SC matching performs well on rigid objects, it is susceptible to articulations. 
This is because the SC histogram is composed of Euclidean distance and angle, which 
cannot handle articulations. To overcome this problem. Ling et al. Q proposed a variant 
of SC, namely. Inner Distance Shape Context (IDSC). The IDSC uses inner distance (the 
length of the shortest path connecting the two points, such that the path lies completely 
within the shape) and inner angle, instead of Euclidean distance and angle, to generate 
the histograms at the sampled points. The use of this changed metric makes the de- 
scriptor invariant to articulations. Also, as suggested by Thayananthan et al. 0], they 



In this paper, we will use the terms cost, distance, and energy, interchangeably 
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make use of the figural continuity constraints and perform the context matching using 
a dynamic programming scheme. Though IDSC looks at distances between points such 
that path connecting them hes completely within the shape's boundary, it still cannot 
capture the interior density of the shape. It relies completely on the distances between 
points that lie on the shape's boundary. We show how this important interior property 
can be captured in Section 13.11 and show how it can be used to construct meaningful 
shape descriptors in Section [3.31 

Bronstein et al. jTj tackle the problem of partial similarity and show how objects that 
have large similar parts (but not completely similar) can be matched. They present a 
novel approach, which shows how partiality can be quantified using the notion of Pareto 
optimality. They use inner distance in order to handle non-rigid objects [sj. The notion 
of Pareto optimality has since been applied by other authors for measuring partiality of 
shapes 

Gopalan et al. [lo| identified that though the use of inner distance provided invariance 
to articulations, it could not be directly applied to "non-ideal" 2-D projections of 3-D 
objects. If the projection took place using a weak perspective, then not all parts of the 
3-D model would get accurately projected onto the 2-D plane. In order to overcome this 
problem, they modeled an articulating object as a combination of approximate convex 
parts and performed affine normalization of these parts. They then use inner distance 
to perform shape matching on the normalized shapes. Their near-convex decomposition 
algorithm takes as input the contour of the object and splits the object into multiple 
convex parts. However, such an approach cannot be followed for shapes such as those 
shown in Figure[Tl since the algorithm would split the object into multiple parts, yielding 
undesirable results. 

The Medial Axis Transform (MAT) and its variant, shock graphs, have been used by 



certain authors for matching shapes Il|,|l2|. The medial axis, or skeleton, is the locus 



of the centers of all maximally inscribed circles of the object. While the MAT captures 
the interior properties of the shape to a large extent, by definition, the generation of a 
skeleton depends on the boundary of the object. Therefore, the objects shown in Figure 
[5] will all have vastly different skeletons. Xie et aHl3], proposed to model shapes using 
skeletal contexts. Their contexts are calculated at the skeleton endings and the bins 
are populated by the non-uniformly sampled points from the boundary. Relying on the 
skeleton, and the boundary points, makes their method susceptible to indentations in 
the contour. We show, in Section[3l how our method does not fall prey to such boundary 
perturbations. 

Due to the diversity involved in shape-matching, it has become difficult to come up 
with a single measure that incorporates all the requirements. While the use of Euclidean 
distance is beneficial for identifying certain classes of objects, the use of inner distance 
favours some others. As a result, researchers have started to fuse two or more techniques 
while calculating the distance between two shapes. Ling et al. li^ identified that the use 
of inner distance was "overkill" for certain classes of objects and proposed a technique to 
balance deformability and discriminability. They calculate the cost between two shapes 
with the help of various distance measures, parameterised by an aspect weight, and 
retain the "best" cost. However, they still use points sampled from the contour and their 
algorithm would therefore be susceptible to objects with strong base structures that have 
indentations in their contours. 

Recently, some effort as gone into the development of perceptually motivated tech- 
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Figure 2: Figure shows more examples of a particular class of objects from the MPEG7 
database. All of the above objects have different contour properties. However, their 
overall visual similarity is still that of a pentagon. 



niques 15|, [iGj. These techniques tackle cases, such as those shown in Figure [2l To the 



human visual system, all objects in the figure appear to belong to the same class. How- 
ever, measures that rely on the contour to obtain the object's shape properties cannot 
fathom this similarity. 

Temlyakov et al. [l^ propose to split the object into a base structure and multiple 
strand structures. They define strands as structures that are thin and long, relatively 
small in size, and attached to a base structure. The strands may be made of inward 
or outward strands. When comparing two shapes, they compare the base structures 
and strand structures separately. They use IDSC for comparing base structures, and for 
the strands, they just check if the two objects have similar number of strands, without 
giving much importance to the detailed geometry. Secondly, they also identify objects 
with a single axis of symmetry and normalize the aspect ratios of the two shapes before 
comparison. Fusing these two strategies along with IDSC helps them achieve better 
retrieval rates. Such an approach will work well if the object has a strong base structure. 
However, in many cases, the objects do not have a well-defined base. Even in the case 
of a strong base structure, multiple parameters, such as area, length, and width, have to 
be set to identify the strands. 

More recently, Hu et al. ,16j proposed a morphological approach to model human 
perceptions. To "close" the objects, they perform morphological closing on the shapes. 
They compare the shapes using IDSC before and after performing the morphological 
operation, and retain the better of the two costs. They perform the morphological 
operations over multiple scales. This calls for an additional scale parameter to be set. 
Secondly, selecting the structuring element for performing the closing operation is also 
a difficult task. In their experiments, they try using structuring elements of different 
sizes and report results from all sizes. In the next section, we explain our novel method 
of capturing the shape properties in their entirety, and in Section 31 we show that our 
method can help generate better retrieval results than and pjj] . 
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All the techniques described above were directed towards the development of a good 
distance measure between pairs of images, where the similarity of an object was influ- 
enced by just one other object. However, recent works have shown that an improvement 
in the retrieval performance can be achieved if other similar shapes are allowed to in- 
fluence the pair-wise scores. For a given similarity measure, a new similarity measure 



is learned through graph transduction [17|. Many methods that focus on improving the 



transduction algorithms have been proposed in the recent past [l8, 1^ 20| . 

Starting the diffusion with a good similarity matrix will lead us to obtain better sim- 
ilarities at the end. A good similarity matrix is one in which similar shapes have high 
affinity. We show that our method helps in generating a better similarity matrix after 
the diffusion process. We use the Locally Constrained Diffusion Process (LCDP) 21] to 
learn the manifold structure of the shapes and show, in Sectional that our matrix is able 
to generate highly competitive retrieval rates. 



3. Solid Shape Context 

In the previous section, we reviewed past work in the area of shape matching and 
pointed to the fact that more research needs to be done in the development of perceptually 
motivated techniques. In this section, we introduce one such perceptually motivated 
technique, which can capture the shape properties in their entirety. 

To motivate our work, let us go back to the examples in Figures [1] and [2J We identify 
that the human visual system not only recognises shapes by their external contour, but 
also by their "density" . We perceive a solid disc as a different object compared to a ring, 
though both have a circle as their outer contour. From this example, wc can see that the 
interior solidity plays an important role in the identification of an object. We propose to 
utilize this important interior property of a shape by coming up with a descriptor, which 
is a variant of the well-known Shape Context descriptor. 

To capture the interior properties of a shape, we propose to sample a set of uniformly- 
spaced Dense Points that lie within the object's body (the reader can see this as the 
blue points in Figure l4d|) . We then sample a much smaller set of points, called Sparse 
Points, where we compute the object's features (crosses in Figure Hil sampled along the 
convex hull). The computed features are our modifications of the shape context, and are 
described using the previously sampled Dense Points. Given two shapes, the contexts at 
their respective Sparse Points are used for comparison. In the following subsections, we 
describe in detail how we sample the dense and sparse points, and how our Solid Shape 
Context (SSC) is computed. 

3.1. Dense Points 

Motivated by the sampling techniques used to approximate probability density func- 
tions, we propose to approximate the interior shape of an object by sampling points 
lying within the object's boundary. Each part of the object is equally important in 
understanding the shape properties. Therefore, we use an uniform sampling scheme to 
sample points that lie uniformly within the shape. 

The issues that we face while sampling from an arbitrary shape are similar to the 
issues that we face while sampling from an arbitrary distribution. Uniformly sampling 
from a well-known and simple shape, such as a square, rectangle, circle, or a triangle, 

6 



Figure 3: In order to sample points from within a circle, we uniformly sample points 
from a bounding square. We retain the points that fall within the circle (green) and 
reject the points that fall outside the circle (red). 

is relatively straightforward. However, uniformly sampling a fixed set of points from a 
random shape is not that simple. 

One common technique that could be adopted is the rejection sampling technique. We 
can encompass the arbitrary shape using a well known, and simple shape (say, circle or 
a square), and uniformly sample points from within it. We can then retain only those 
points that fall within the shape boundary and reject the rest that lie outside the shape. 
Figure [3] gives an illustration of the rejection sampling technique. 

While rejection sampling is a very simple method, there are a some issues that we 
encounter. It is difficult to efficiently sample a fixed number of points lying inside the 
shape boundary without wasting samples. For shapes with elongated parts, such as the 
tentacles of an octopus, the accept/reject method wastes a number of samples, which 
is proportional to the ratio in areas between the bounding rectangle and the object; 
this ratio can be quite high for objects with parts spread over a large region, such as 
horseshoes, octopi, or insects. Even for simple shapes such as the circle in Figure [31 a 
large number of points shown in red are wasted. Our method, which is described below, 
does not waste any samples, and is therefore able to maintain a constant complexity 
regardless of the shape of object. 

The above problems are encountered because of two reasons: 

1. We are trying to sample points from arbitrary shapes, for which there are no elegant 
sampling techniques. 

2. We are not restricting ourselves to the interior of the shape, before sampling. 

We wish to overcome these problems by making use of the object's boundary con- 
straints. Firstly, we restrict our sampling area such that it lies totally within the object's 
boundary. Secondly, we ensure that the area we are sampling from is a simple shape, 
to ensure easy sampling. Below, we explain in detail how we propose to sample a fixed 
number of Dense Points without wasting any samples. 
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Given a shape S, we can easily extract its boundary, . We then sample a set of 
uniformly spaced points, Bp = {Bp^,Bp^, ...,Bp^^^ that lie on the boundary of the 

object (Figure l4b|) . A point Bp. neighbours just two other points Bp._^ and Bp.^^ (the 
indices are taken modulo \Bp\). We make use of this neighbourhood constraint and per- 
form a Constrained Delaunay Triangulation (CDT) of these points. A CDT ensures that 
the edges specified as the constraints are retained in the triangulation process [2^ . The 
constraints that we specify are the neighbourhood constraints i.e., our constraint ensures 
that a point Bp. has an edge to its two neighbours Bp._^ and Bp.^^. Once the triangula- 
tion is performed, we remove the triangles that lie in the concavities and holes of a shape 
(23} . This guarantees that the triangles generated from the triangulation lie totally within 
the object's boundary. For a given set of points on the boundary, such a Constrained 
Delaunay Triangulation produces TVg — 2 triangles, Tri^ = {Trif , Trif , Tri^g_2}. 
Figure l4cl shows the output of the constrained triangulation. Notice that all the triangles 
now lie within the object's boundary, especially at the bottom left of the butterfly where 
there is a noisy indentation. 

With the above formulation, it becomes easy to sample a fixed number of points such 
that they lie completely within the object's boundary. For any triangle, Trif , with 
vertices X, Y, and Z, a random point p, lying inside the triangle, can be generated using 

p={l- Vn)X + Vn(l - r2)Y + V^raZ, (1) 



where ri, r2 G [0,1] are two random numbers, independent of each other 2J]. X, Y 



and Z, are 2-D vectors containing the x and y coordinates of the three vertices. The 
set of points lying inside Trif is given by V'^"^ = {p^*^*' :P2"^ ^ ■■■TP^^TriS In order 

to generate \'P'^^''' \ points lying inside the triangle Trif , all we have to do is generate 
[7-''^'''' I pairs of random numbers {ri,r2), from a uniform distribution, and use Equation 
Uto generate the points. 

In order to generate Nop number of uniformly distributed Dense Points, lying within 
the object's boundary (Figure lid]), we generate I'P'^"^ \ uniformly distributed points from 
within each triangle Trif yi £ {1, 2, Nt^ — 2}, such that jT'^'*; | is proportional to the 
area of triangle Trif , as given in Equation [2] 



A 



Tri 



\'P'''''\ = ^^N^. Ndp (2) 



, = 1 ^Trif 



Arpj.^s is the area of triangle Trif . Therefore, 



Nb-2 

We have shown how we can generate a fixed number of points from within any random 
shape, easily. Our method overcomes the two problems that were previously listed in this 
section. Firstly, we restrict the sampling area to lie within the shape, thus preventing 
any sampled points from being wasted. Secondly, we sample from a very simple polygon, 
a triangle, thus making the sampling of uniformly spaced random points quick and easy. 





Figure 4: [Best Viewed in Color] (a) The figure shows the silhouette of a butterfly 
with a noisy indent in the contour, (b) Uniformly sampled boundary points, Bp, from 
the contour, (c) Output of the Constrained Delaunay Triangulation. The constraint is 
a simple neighbourhood continuity constraint of the sampled contour points. All the 
triangles now lie within the object's boundary, (d) Dense Points sampled from inside 
each triangle according to Equations [T] and [21 (e) Sparse Points, represented by crosses 
(zoom into the figure), are sampled from the boundary of object's convex hull. Solid 
Shape Context histogram is computed using log-polar bins at each Sparse Point, (f) A 
visualization of the Solid Shape Context (SSC) histogram. 
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These densely sampled points approximate the interior density of a shape. Our shape 
descriptor models the shape in its entirety by making use of these Dense Points. The 
SSC shape descriptors are generated at each Sparse Point location. A discussion of how 
to select the location of these Sparse Points is given in the following subsection. 

3.2. Sparse Points 

A shape S is described using SSC, at Nsp locations, where Nsp <C Nup. Due to the 
fact that they are relatively less in number, compared to the Dense Points^ we call them 
Sparse Points. It is usually enough if we generate the shape descriptors at these sparse 
set of locations, instead of generating them at each dense point. 

The next question that arises is, how and where on the object to localise these sparse 
points. Ideally, we would like these feature locations to be uniformly spread across the 
object. We want the descriptors to describe the shape from a varied number of vantage 
points. One way to do this would be to generate a minimal enclosing rectangle for the 
object, and uniformly divide the rectangle into Nsp number of cells, and mark the centers 
of these cells as the locations of the Sparse Points. However, doing so would not enable 
us to make use of the continuity constraints while comparing the descriptors between 
two shapes. 

Another approach could be to make use of the uniformly sampled points on the bound- 
ary, Sp, as the Sparse Points^ similar to the boundary sampling used in Q and Q. While 
this would enable us to make use of the continuity constraints that occur naturally, it 
would lead us to obtain certain erroneous matches, resulting in increased costs of match- 
ing two shapes. These erroneous matches would occur in cases where there are strong 
indentations in the boundary of the object, such as the examples shown in Figure [TJ All 
the descriptors at the landmark points that lie on the indentations will have a vastly 
different representation of the shape compared to the descriptors that are extracted from 
a shape without similar (or, any) indentations. Thus, selecting the set of points. Bp, as 
the landmark points does not seem to be a good idea. 

To retain the advantage of the continuity constraints and still have Sparse Points that 
are independent of the indentations in the contour, we propose to sample the feature 
point locations along the boundary of the convex hull of the shape. Sampling landmark 
points along the convex hull gives us many advantages. Since the convex hull encloses 
the object completely, we retain the advantage of having the descriptors describe the 
object from various vantage points. Secondly, sampling from the convex hull gives us 
larger insensitivity to boundary perturbations. Along with the densely sampled points, 
which help in handling noisy indentations, sampling along the convex hull also prevents 
such indentations from unnecessarily affecting the landmark selection. Thirdly, sampling 
along the convex hull gives a better rotation invariance to the descriptor. Rotation in- 
variance is usually added to the descriptor by tangent angle normalization. Calculating 
the tangent angle on the boundary of the convex hull gives better invariance to rotation 
than when the normalization angle is calculated using the tangent on a noisy contour. 
Such unwanted perturbations in the boundary would randomly skew the tangents along 
the boundary, thus causing large amounts of noise to be added during the angle normal- 
ization step. Finally, using the convex hull can be an advantage even when the shapes 
are highly concave. Since our sampling procedure ensures that the sampled points always 
lie inside the shape boundary, the absence of dense points in the concavities of the shape 
help capture the concave properties of the shape. Ex: The characteristic property of a 
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horseshoe is its concavity, and this property is captured in our shape descriptor by means 
of zero height bins (see the fohowing subsection). 

Due to the above mentioned advantages, similar to Bp, we obtain the set of Sparse 
Points, SV^ = {SVf,S'P2T--,SVfjgp}, for shape S, sampled along the convex hull of 
the shape, and compute the SSC descriptor at each of these points. Figure Hel shows how 
the sparse points are sampled from the object's convex hull. Notice the insensitivity of 
the Sparse Points to the indentations in the butterfly's boundary. 

Now that we have a set of Dense Points that can be used to model the interior of the 
shape, and a set of Sparse Points to represent the shape, we go on to describe how we 
generate our SSC descriptor using both these sets of points. 

3.3. Solid Shape Context Descriptor 
At each sparse point SVi, we generate a 2-D histogram 

■Hf (fc) = : PT'f e hin{k)), (4) 

where, 'DV^ is the j-th Dense Point, k is the bin number, i e {1,2, ...,Nsp\, and 
j G {1, 2, Nop}. Similar to @, we use 8 distance bins and 12 angular binsp to gener- 
ate the log-polar histogram. We use the Euclidean distance and Euclidean angle (similar 
to [^) to calculate the distance, and angle, between a Sparse Point and a Dense Point. A 
given shape S can now be described by a set of histograms, SSC^ — {T~Lii "^f j •••j ^Wsp }■ 
Similar to SC and IDSC, SSC is inherently invariant to translation. It can be made invari- 
ant to rotations, and scale, by tangent normalization, and mean distance normalization, 
respectively. Figure |4f| gives a visualization of the SSC histogram for one of the sparse 
locations. 

Given two shapes Si and S2, matching them now boils down to matching their re- 
spective histogram sets, SSC^^ and SSC^^ . The goal of the matching stage is to find a 
mapping function cj), which minimizes the cost of mapping the histogram "Hf^ to ^^(j)- 
The total cost of matching shape S'l to shape 5*2 is given by 

^ssc[Si,S2) = i2^^'^f"^KW- (5) 

i=l 

The distance between two histograms, V'(Hf ' , 'Hl\^■^), is defined by the test statistic. 
If the distance between the two histograms is greater than an acceptable threshold t(= 
0.6), we set the distance to equal r, and set (/)(«) to 0, which means to say that we were 
not able to find a suitable match for T-if^, in shape S2. Similar to Q, we use a dynamic 
programming scheme to match the two sets of histograms. 

Finally, the true cost between the two shapes S'l and S2 can be computed as 

*(S'i, 52) = min(*/i55c, a-^ssc), (6) 

where idsc is the cost of matching the two shapes using the standard IDSC method 
0: ^ssc is given by Equation [51 and a is a normalization constant, which is used to 
normalize the two costs. Fusing two or more costs to obtain the smallest cost has become 
popular in the recent past and is used in [3], [l^ and (lif . 

Figure|4]illustrates all the steps involved in the generation of the SSC shape descriptor. 
In the next section, we demonstrate the effectiveness of our SSC descriptor using the 
results obtained from our experiments. 
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4. Experiments and Results 



We use the well-known, and widely used, MPEG7 CE-Shape-1 Part B dataset for 
testing our algorithm. The database consists of silhouettes of 1400 images with a wide 
variety among them. The database is split into 70 classes, with each class containing 
20 example images. The database consists of both rigid and non-rigid objects. The 
objects in the database have varied levels of translations, rotations, scales, articulations, 
deformations and occlusions. The objects belonging to a particular class are not only 
similar by the contour properties, but also by their overall visual similarity. The database 
is considered as a challenging database as there are many instances where the inter-class 
object similarity is more than intra-class object similarity. Figure [5] shows an example 
object from each of the 70 classes. 

The performance of the algorithm on the database is measured by the BuUseye score. 
To calculate the Bullseye score, each image is compared to every other image in the 
database. The top 40 best-matching images are retained, of which at most 20 images 
can belong to the same class. Of the top 40 best matches, the number of objects belonging 
to the same class as the template image are counted. This number is divided by 20 to get 
the Bullseye score for the template image under consideration. The average Bullseye score 
over all the images in the database gives the Bullseye score for the complete database. 

In our experiments, we set Njg — 100 for triangulating the shape, Nsp to 300, N^p 
to 2000, and a to 4. The mean of the costs obtained from IDSC was about four times 
the mean of all the SSC costs. The range of values was also smaller than the range of 
values from IDSC. Hence, the choice of a = 4. We set Nsp to 300 as previous works 



14j have used 300 feature points for shape comparison. Minor improvements were seen 



in the Bullseye score for Nsp > 300. No major decrease in performance was seen for 
Nsp < 300. A coarse representation of the shape is sufficient for interior sampling. With 
— 50, the overall shape boundary was not decipherable for some highly convoluted 
shapes. Thus, we increased it to 100. With further increases such as {200, 300, 400}, we 
did not see any major improvement in the overall results. We also tried experimenting 
with larger values of Nop (2500, 3000, 3500 and 4000), but did not find any significant 
improvement in the Bullseye score. Table [1] lists our Bullseye score along with the 
Bullseye scores for various algorithms. 

As can be seen from Table [U quite a lot of work has been done in the area of shape 
matching. We fuse the costs from our algorithm with the costs from IDSC. Doing so 
significantly improves the Bullseye score from 85.40%, to 91.65%. The objects in the 
MPEG7 database have different aspect ratios as well. Performing aspect normalization 
of shapes, as in [l^, helps improve the Bullseye score further, to 91.83%. Temlyakov et 
al. [l5| perform a similar fusion, and their algorithm helps improve the Bullseye score 
to 88.39%, while Hu et al.'s 16] method improves the score to 90.18%. We specifically 
compare our algorithm to these two methods as they are also perceptually motivated 
techniques. We would like to mention that the method in flBJ requires the setting of 
threshold parameters for the identification of strand structures. Also, the method in 
[lH | requires the selection of an appropriate structuring element and the identification 
of a proper scale at which to perform the morphological closing operation. Our method 
can help achieve a better Bullseye score without the requirement of such additional 
parameters. 

Figure [Sal shows class-specific Bullseye scores for both IDSC and SSC. We can see that 
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Algorithm 


BuUseye Score 


Visual Parts [25] 


76.45% 


SC+TPS [4] 


76.51% 


Generative Model [26] 


80.03% 


Curvature Scale Space [271 


81.12% 


SSC 


82.39% 


Polygonal Multiresolution [28] 


84.33% 


Multiscale Representation [29] 


84.93% 


IDSC [^ 


85.40% 


Symbolic Representation [30] 


85.92% 


Hierarchical Procrustes Matching [31] 


86.35% 


IDSC(EMD) [32] 


86.53% 


Triangle Area [33] 


87.23% 


Shape Tree [3] 


87.70% 


ASC \U\ 


88.30% 


IDSC+AspectNorm.+StrandRemoval flS] 


88.39% 


Contour Flexibility [34] 


89.31% 


IDSC+PMMS [1^ 


90.18% 


IDSC+LP [20] 


91.00% 


IDSC+SSC 


91.65% 


IDSC+AspectNorm.+SSC 


91.83% 


IDSC+LCDP [21] 


92.36% 


IDSC+Affine Normalization [10] 


93.67% 


IDSC+AspectNorm.+StrandRemoval+LCDP [15] [21] 


95.60% 


ASC+LCDP [14] [21] 


95.96% 


IDSC+PMMS+LCDP [16] [21] 


98.56% 


IDSC+SSC+LCDP 


98.85% 


IDSC+Affine Normahzation+TPG [19] 


99.99% 



Table 1: The table gives a comprehensive list of shape- matching techniques proposed in the 
literature, along with their respective BuUseye scores. We can see that our method helps in 
significantly improving the BuUseye score when fused with IDSC. Diffusion techniques, such as 
LCDP, further improve our BuUseye score. 
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Figure 5: The figure shows example images from the MPEG7 Database. Shown above is 
an example from each class. As can be seen, the database consists of images from both 
rigid and non-rigid objects. 



the SSC performs better than IDSC for classes 21 through 32. These are the classes 
where there are a lot of indentations in the objects. Also, IDSC performs better than 
SSC in some other classes. These classes correspond to the the classes of articulating 
objects. Ex: Lizard, Octopus, etc. From the bar chart, we can see that SSC complements 
IDSC well. Figure |6b] shows the class-specific gain in Bullseye score when SSC is fused 
with IDSC, over IDSC alone. We can see a significant gain in the Bullseye score for a 
number of classes. Most of the classes that have a gain correspond to the classes where 
the objects have an overall visual similarity. Many objects in these classes have a number 
of indentations in their contours. Figure [Sc] shows the class-specific Bullseye score when 
IDSC is fused with SSC. We can see a much more evened out score among all the classes. 
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Figure 6: (a) Subfigure shows the class-specific Bullseye score for IDSC (top) and SSC 
(bottom). We can see that SSC complements IDSC. SSC performs better than IDSC 
for classes 21 through 32, while IDSC performs better than SSC for some other classes, 
(b) Subfigure shows the percentage gain in Bullseye score for each class, when SSC is 
fused with IDSC, over IDSC alone. We can see a significant improvement in the Bullseye 
score for the classes 21 through 32, which correspond to classes with visually similar 
objects, but having many indents in their contours, (c) Class-specific Bullseye score for 
IDSC-I-SSC. The bar chart shows a much more evened out score among all the classes. 
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Figure [7] shows a comparison of the retrieval results for an example object. The first 
object is the query object and the rest are its top-40 best matching objects. The objects 
with green bounding box are correct retrievals and the objects with red bounding box 
are incorrect retrievals. As can be seen from the figure, IDSC retrieves just 3 correct 
objects, while SSC retrieves all 20 objects belonging to the same class as the query object. 
Moreover, while using SSC, all of the 20 objects lie in the top-20 locations. 

In this paper, we tackle the case where objects have minor and major indents in 
their contours. Our method works even when there are breaks in contours, such as the 
character 'M' shown in Figure [TJ Fusing our method with the costs of IDSC means that 
SSC will take over whenever IDSC performs poorly, and vice-versa. So, in the cases of 
major protrusions from the shape's boundary, the cost from IDSC will take over. We 
show below, from experiments on the Kimia database, that fusing the two costs has very 
minimal negative effect on the overall results. To correctly match shapes with major 
protrusions, one might employ the strand removal method from [l5| . and fuse a third 
cost in Eq. [51 as done by the authors of the same. No one method can correctly match 
all types of objects. This is why recent works (see Section [5]) have adopted to fusing two 
or more costs. The results that we show in Table [T] are obtained by fusing just two costs. 
The Bullseye score will increase further if a third cost (from strand removal), or even 
more complementary costs [l^, are fused together. 

We use IDSC as the base algorithm since its code, and matrix, are easily available. 
From Table [TJ we can see that [13] produces the best Bullseye score without manifold 
learning. However, as mentioned in Section [21 fid\ decomposes the object into multiple 
convex parts and performs affinc normalization of the individual parts. Doing so would 
cause unwanted partitioning of the objects such as those shown in Figures [1] and [H We 
would expect a high cost of matching when the top-left object in Figure[T]is matched with 
second object in the same figure, if we used the method in [lOj. We believe that if SSC 
was combined with 10], it would improve its Bullseye score similar to how it currently 
improves IDSC's Bullseye score. The method of TO] is not designed for shapes with 
indentations (such as Figure [2]), which our method is suited for, and therefore combining 
the two methods should produce a better overall cost as the two methods are otherwise 
compatible. SSC being complementary to IDSC, would also be complementary to the 
articulation invariant representation used in [lol ]. Thus combining SSC with [IQ] would 
help us get state-of-the-art results on the MPEG7 database, before manifold learning. 

We also calculated the percentage of correct retrievals among the top-20 locations 
for IDSC and IDSC-fSSC. When IDSC is used alone, it provides a correct retrieval 
percentage of 76.96%, while IDSC+SSC gives a correct retrieval percentage of 83.78%. 
We also calculated the average first position of a wrongly classified shape. For each 
shape, we find the location of the first wrongly classified shape, and take the average 
of this location over all shapes. The average first position of a wrongly classified shape, 
over all shapes, for IDSC, was found to be 14.43, and for IDSC-I-SSC, it was found to 
be 16.1371. Since there are 20 objects in each class, the best average first position of a 
wrongly classified shape is 21. So, the closer this number is to 21, the better. We can 
see that IDSC makes mistakes much earlier in the retrieval ordering when compared to 
IDSC+SSC. 

In Section[2l we mentioned certain works that iinproved the retrieval results by allowing 



similar shapes to infiuence the pair- wise scores [21|, [17|, [18|, [20[ . These methods try to 
learn the underlying shape manifold structure and thus learn a better geodesic distance 
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Figure 7: [Best viewed in colour] (a) Retrieval example from IDSC. We can see that just 
3 of the top 40 best matching objects belong to the same class, (b) Retrieval example 
from SSC. All 20 objects, from the same class as the query object, have been retrieved. 
Also noticeable is that all the 20 objects lie in the top-20 locations. 
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Figure 8: [Best Viewed in Color] Precision-Recall curves for IDSC, IDSC+SSC, and 
IDSC+SSC+LCDP. We can see that the IDSC+SSC curve is clearly above the IDSC 
curve for all recalls. 



between two shapes. These methods take the pair- wise similarity matrix and perform 
diffusion on it. In order to end up with a better similarity matrix, it would be ideal if we 
had a good similarity matrix to start off. A good similarity matrix does not necessarily 
mean a matrix that produces a good Bullseye score. A good similarity matrix is one that 
has a low cost for similar shapes and a high cost for dissimilar shapes. This means that 
the costs between similar shapes, to begin with, are much more closer to the true geodesic 
distance on the shape manifold. Our similarity matrix does have such properties. 

We use the Locally Constrained Diffusion Process (LCDP) [2l| to perform the diffusion 
on our augmented matrix. The use of LCDP increases the Bullseye score of IDSC+SSC 
from 91.83% to 98.85%. The improvement of the Bullseye score to close to 100% shows 
that the matrix we started off with had good pair-wise similarity scores. The state-of- 
the-art results of 99.99% shown in Table [T] were achieved when diffusion was performed 
on the matrix from 10], which has a higher Bullseye score though it is not perceptually 
motivated. Moreover, the diffusion was performed using a Tensor Product Graph (TPG) 
affinity learning procedure [19j . which uses higher order relations between shapes. We, 
on the other hand, show results that were obtained by performing diffusion using LCDP, 
which uses just single-order relations between shapes, and on a matrix that obtained 
by augmenting the IDSC matrix. We use LCDP because it facilitates comparison with 
other techniques that also use LCDP [H, H [H, iH , and, in addition, its source code is 
available. However, we can use TPG for learning the shape manifold as well. 

In Figure |51 we plot the precision-recall curves, as in 35|. The curve compares the 
precision of IDSC, with that of IDSC+SSC, over various recalls. From the figure, we can 
clearly see that IDSC+SSC has a better precision than IDSC alone, over all recalls. We 
also plot the curve for IDSC+SSC+LCDP. 

We use IDSC as a base technique, and LCDP as the diffusion technique, since the 
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code for both the algorithms is easily available. We stress that the costs obtained from 
our algorithm can be fused with the costs from any other algorithm. Finally, we tested 
our descriptor primarily on the MPEG7 database as it is one of the most challenging 
shape databases. Other databases such as Kimia database, Natural Silhouette Database, 
ETH-80 Shape Database, etc, are composed of relatively simple shapes, and have much 
lesser number of objects compared to the MPEG7 database. Moreover, it is only the 
MPEG7 database that has the perceptually similar shape classes with vastly different 
contour properties. We do, however, compare the average first position of a wrongly 
classified shape for the Kimia dataset 2 ■ When IDSC is used alone, the average first 
position of a wrongly classified shape is 11.5455, while that for IDSC+SSC is 11.3939; 12 
being the ideal score, as there are 11 objects in each class of the Kimia database. This 
shows that even though the Kimia database does not have objects with intrusions in its 
contour (in fact, it has objects with major protrusions), fusing SSC with IDSC has very 
minimal negative effect on the overall results. 

5. Conclusions and Future Work 

In this paper, we identified certain problems that traditional contour-based shape 
matching techniques face while performing shape matching. We showed that the shape 
interiors play an important role in object recognition, and proposed a perceptually mo- 
tivated variant of the well-known Shape Context descriptor, which captures the shape 
properties in their entirety. We showed the benefits of modelling the interior properties 
of the shape using Dense Points. We proposed a new way for sampling from within a 
shape boundary, in order to capture the interior properties of the shape. We then listed 
out the advantages of using the convex hull of a shape to select the landmark points. We 
also showed how augmenting traditional shape-matching techniques with the costs from 
our SSC descriptor can significantly improve the retrieval rates. 

As for future research directions, we feel a need for the construction of a database 
that consists of shapes that are identified by humans based on the Gestalt properties of 
the Human Visual System. This is an area of research that has not been explored well 
in the community. We encounter such objects multiple times, in our daily lives. There 
are many instances where we find characters written using the "stencil font" . Most road 
markings use stencil font for conveying messages (final example of Figure [l]). Also, the 
logos of many companies are based on stencil font. 

We hope that this work of ours would motivate other researchers in the community to 
take the area of shape matching to the next level, by coming up with other perceptually 
motivated techniques. 
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